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ABSTRACT 

Context. The radio interferometer measurement equation (RIME), especially in its 2 x 2 form, has provided a comprehensive matrix- 
based formalism for describing classical radio interferometry and polarimetry, as shown in the previous three papers of this series. 
However, recent practical and theoretical developments, such as phased array feeds (PAFs), aperture arrays (AAs) and wide-field 
polarimetry, are exposing limitations of the formalism. 

Aims. This paper aims to develop a more general formalism that can be used to both clearly define the limitations of the matrix RIME, 
and to describe observational scenarios that lie outside these limitations. 

Methods. Some assumptions underlying the matrix RIME are explicated and analysed in detail. To this purpose, an array correlation 
matrix (ACM) formalism is explored. This proves of limited use; it is shown that matrix algebra is simply not a sufficiently flexible 
tool for the job. To overcome these limitations, a more general formalism based on tensors and the Einstein notation is proposed and 
explored both theoretically, and with a view to practical implementations. 

Results. The tensor formalism elegantly yields generalized RIMEs describing beamforming, mutual coupling, and wide-field po- 
larimetry in one equation. It is shown that under the explicated assumptions, tensor equations reduce to the 2x2 RIME. From a 
practical point of view, some methods for implementing tensor equations in an optimal way are proposed and analysed. 
Conclusions. The tensor RIME is a powerful means of describing observational scenarios not amenable to the matrix RIME. Even in 
cases where the latter remains applicable, the tensor formalism can be a valuable tool for understanding the limits of such applicability. 
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Introduction 



Since its formulation by Hamakeret al. (1996), the radio inter- 
ferometer measurement equation (RIME) has been adopted by 
the calibration and imaging algorithm development as the math- 
ematical formalism of choice when describing new methods and 
techniques for processing radio interferometric data. In its 2 x 2 
matrix v ersion (also know n as the Jones formalism, or JF) devel- 
oped by Hamaker (2000), it has achieved remarkable simplicity 
and economy of form. 

Recent developments, however, have begun to expose some 
limitations of the matrix RIME. In particular, phased array feeds 
(PAFs) and aperture arrays (AAs), while perfectly amenable to 
a JF on the systems level (in the sense that the response of a pair 
of PAF or AA compound beams can be described by a 2 x 2 
Jones matrix), do not seem to fit the same formalism on the el- 
ement level. In general, since a Jones matrix essentially maps 
two complex electromagnetic field (EMF) amplitudes onto two 
feed voltages, it cannot directly describe a system incorporat- 
ing more than two receptors p er station (as in, e.g., the "tripole" 
design of|B ergman et al. 2003). And on the flip side of the coin, 
ICarozzi & Woanl ([2009) have shown that two complex EMF am- 
plitudes are insufficient - even when dealing with only two re- 
ceptors - to properly describe wide-field polarimetry, and that a 
three-dimensional Wolf formalism (WF) is required. Other "awk- 
ward" effects that don't seem to fit into the JF include mutual 
coupling of receptors. 



These circumstances seem to suggest that the JF is a special 
case of some more general formalism, one that is valid only un- 
der certain conditions. The second part of this paper presents one 
such generalized formalism. However, given the JF's inherent el- 
egance and simplicity, the degree to which is is understood in the 
community, and (pragmatically but very importantly) the avail- 
ability of software implementations, it will in any case continue 
to be a very useful tool. It is therefore important to establish the 
precise limits of applicability of the JF, which in turn can only 
be done in the context of a broader theory. 

The first part of this paper therefore re-examines the basic 
tenets of the RIME, and highlights some underlying assumptions 
that have not been made explicit previously. It then proposes a 
generalized formalism based on tensors and Einstein notation. 
As an illustration, some tensor RIMEs are then formulated, for 
observational scenarios that are not amenable to the JF. The ten- 
sor formalism is shown to reduce to the JF under the previously 
established assumptions. Finally, the paper discusses some prac- 
tical aspects of implementing such a formalism in software. 



1. Why is the RIME 2x2? 

As a starting point, I will cons ider the RIME f ormulations de- 
rived in Paper I of this series (ISmirnov I l2011al) . A few crucial 
equations are reproduced here for reference. Firstly, the RIME 
of a point source gives the visibility matrix measured by inter- 
ferometer pq as the product of 2 x2 matrices: the intrinsic source 
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brightness matrix B, and the per-antenna Jones matrices J p and 



(1) 



The Jones matrix J p describes the total signal propagation 
path from source to antenna p. For any specific observation and 
instrument, it is commonly represented by a Jones chain of in- 
dividual propagation effects: 



J p — J p,nJ p,n- 1 - ■ -J p, 1 ) 

which leads to the onion form of the RIME: 

V p? = J p, n {—U 'p,2(J p,l&J q,l)Jqa)-)Jqjn 



(2) 



(3) 



The individual terms in the matrix product above correspond 
to different propagation effects along the signal path. Any prac- 
tical application of the RIME requires a set of matrices describ- 
ing specific effects, which are then inserted into Eq. ©. These 
specific matrices tend to have stand ard single-letter designations 
(see e.g.l Noordam & Smirnovll2010i Sect. 7.3). In particular, the 
K p terrrQ describes the geometric (and fringe stopped) phase 
delay to antenna p, K p = e-^TV+^+V"- 1 )) _ The rest of the 
Jones chain can be partitioned into direction-independent effects 
(DIEs, or Mv-plane effects) on the right, and direction-dependent 
effects (DDEs, or image-plane effects) on the left, designated 
afi G p and E p . We can then write a RIME for multiple discrete 
sources as 



V 



G, 



2^ E sp K sp E s K" q E" q | G h q . 



(4) 



Substituting the exponent for the K p K q term then gives us 
the Fourier transform (FT) kernel in the full-sky RIME: 



ipq 



V lm 



E p BE H q t- 2 " i{u ^ +v -' m) dlAm 



(5) 



where all matrix terms3 under the integration sign are func- 
tions of direction I, m. 

The first fundamental assumption of the RIME is linearityQ 
The second assumption is that the signal is measured in a narrow 
enough frequency band to be essentially monochromatic, and at 
short enough timescales that J p is essentially constant; depar- 
tures from these assumptions cause smearing or decoherence , 
which has already been reviewed in Paper I (ISmirno v1 l20TTal 
Sect. 5.2). These assumptions are obvious and well-understood. 
It is more interesting to consider why the RIME can describe in- 
strumental response by a 2 x 2 Jones matrix. Any such matrix 
corresponds to a linear transform of two complex number into 
two complex numbers, so why two and not some other number? 
This actually rests on some further assumptions. 



1 Fo llowing the typographical conventions of Paper I (ISmirnovl 
1201 lal . Sect. 1.4), I use normal-weight italics for K p to emphasize the 
fact that it is a scalar term rather than a full matrix. 

2 Strictly speaking, G p encompasses the DIEs up to and not including 
the leftmost DDE. 

3 Note that the E p term in this equation also incorporates the w-term, 
W p = s' 2mw i"i" I ^Jn, which allows us to treat the integration as a two- 
dimensional FT. 

4 Real-life propagation effects are linear either by nature or design, 
with the exception of a few troublesome regimes, e.g. when using cor- 
relators with a low number of bits. 



1.1. Dual receptors 

In general, an EMF is described by a complex 3-vector s. 
However, an EMF propagating as a transverse plane wave can 
be fully described by only two complex numbers, e = (e x , e y ) T , 
corresponding to the first two components of e in a coordinate 
system where the third axis is along the direction of propagation. 
At the antenna feed, the EMF is converted into two complex volt- 
ages v = (v fl , Vb) T . Given a transverse plane wave, two linearly 
independent complex measurements are necessary and sufficient 
to fully sample the polarization state of the signal. 

In other words, a 2 x 2 RIME works because we build dual- 
receptor telescopes; we do the latter because two receptors are 
what's needed to fully measure the polarization state of a trans- 
verse plane wave. PAFs and AAs have more than two receptors, 
but once these have been electronically combined by a beam- 
former into a pair of compound beams, any such pair of beams 
can be considered as a virtual receptor pair for the purposes of 
the RIME. 

ICarozzi & Woanl (|2009) have pointed out that in the wide- 
field case, the EMF arriving from off-axis sources is no longer 
parallel to the plane of the receptors, so we can no longer mea- 
sure the polarization state with the same fidelity as for the on- 
axis case. In the extreme case of a source lying in the plane of 
the receptors, the loss of polarization information is irrecover- 
able. Consequently, proper wide-field polarimetry requires three 
receptors. With only two receptors, the loss-of-fidelity effect can 
be described by a Jones matrix of its own (which the authors 
designate as T <xy) ), but a fully three-dimensional formalism is 
required to derive r (xv) itself. 



1.2. The closed system assumption 

When going from the basic RIME of Eq. ([TJ to Eq. (0, we de- 
compose the total Jones matrix into a chain of propagation ef- 
fects associated with the signal path from source to station p. 
This is the tradition al way of applying the RIME pioneered in 
the original paper (lHamaker et al.ll 996). and continue d in subse- 
quent literature describing applications of the RIME (iNoordaml 
Il99fl iRau et al]l2009l iMvers et alJ2010t ISmirnovll201 lah . 

Consider an application of Eq. (01 to real life. Depending 
on the application, individual components of the Jones chains 
J p j may be derived from a priori physical considerations and 
models (e.g. models of the ionosphere), and/or solved for in a 
closed-loop manner, such as during self-calibration. Crucially, 
Eq. © postulates that the signal measured by interferometer pq 
is fully described by the source brightness B and the set of matri- 
ces J p i and J q j, and does not depend on any effect in the signal 
propagation path to any third antenna r. If, however, antenna r 
is somehow electromagnetically coupled to p and/or q, the mea- 
sured voltages v p and v q will contain a contribution received via 
the signal path to r, and thus will have a non-trivial dependence 
on, e.g., J n \ that cannot be described by the 2x2 formalism 
alone. 

To be absolutely clear, the basic RIME of Eq. ([TJ still holds 
as long as any such coupling is linear. In other words, there 
is always a single effective J p that ties the voltage v p to the 
source EMF vector e. In some applications, e.g. traditional self- 
cal, where we solve for this J p in a closed-loop manner, the 
distinction on whether J p depends on propagation path p only, 
or whether other effects are mixed in, is entirely irrelevant. 
However, when constructing more complicated RIMEs (as is be- 
ing done currently for simulation of new instruments, or for new 
calibration techniques), an implicit assumption is made that we 
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may decompose J p into per-station Jones chains, as in Eq. ([3). 
This is tantamount to assuming that each station forms a closed 
system. 

Consider the effect of electrical cross-talk, or mutual cou- 
pling in a densely-packed array. If cross-talk or coupling is re- 
stricted to the two receptors within a station, such a station 
forms a closed system. For a closed system, the Jones chain 
approach is perfectly valid. If, however, cross-talk occurs be- 
tween receptors associated with different stations, the recep- 
tor voltages v p will not only depend on J p j, but also on J q j, 
J n k, etc. (See Sect. 12.11 for a more thorough discussion of this 
point.) With the emergence of AA and PAF designs for new 
instruments, we can no longer safely assume that two recep- 
tors form a closed system; in fact, even traditional interferom- 
eters can suffer from troublesome c ross-talk in certain situations 
(ISubrahmanvan & Deshpa nde 2004). 

Some formulations of the RIME can incorporate coupling 
within each pa ir of stations p and q via an additional 4x4 
matrix (see e.g. iNoordamll 19961) used to describe multiplicative 
interferometer-based effects. By definition, this approach cannot 
incorporate coupling with a third station r; any such coupling 
requires additional formulations that are extrinsic to the 2x2 
RIME, such as the ACM formalism of Sect. [2] or the tensor for- 
malism that is the main subject of this paper. 

The closed system assumption has not been made explicit in 
the literature. This is perhaps due to the fact that the RIME is 
nominally formulated for a single interferometer pq. Consider, 
however, that for an interferometer array composed of N sta- 
tions, the "full" RIME is actually a set of N(N - l)/2 equa- 
tions. By treating the equations independently, we're implicitly 
assuming that each equation corresponds to a closed system. 
The higher-order formalisms derived below will make this issue 
clear. 

1.3. The colocation assumption 

A final seldom explicated assumption is that each pair of recep- 
tors is colocated. While not required for the general RIME for- 
mulation of Eq. (0 per se, colocation becomes important (and is 
quietly assumed) in specific applications for two reasons. Firstly, 
it allows us to consider the geometric phase delay of both re- 
ceptors to be the same, which makes the K p matrix scalar, and 
allows us to commute it around the Jones chain. K p and can 
then be commuted together to form the FT kernel, which is es- 
sential for deriving the full-sky variants of the RIME such as 
Eq. 0. And secondly, although the basic RIME of Eq. ([]]) may 
be formulated for any four arbitrarily-located receptors, when 
we proceed to decompose J p into per-station terms, we implic- 
itly assume a single propagation path per each pair of receptors 
(same atmosphere, etc.), which implies colocation. In practice 
the second consideration may be negligible, but not so the first. 

Classical single-feed dish designs have colocated receptors 
as a matter of course, but a PA F system such as APE RTIF 
van Cappellen & Bakkerl l2010h or ASKAP dJohnston et alJ 



2008) typically has horizontally and vertically oriented dipoles 
at slightly different positions. The effective phase centres of the 
beamformed signals may be different yet again. The K p matrix 
then becomes diagonal but not scalar, and can no longer be com- 
muted around the RIME. In principle, we can shoehorn the case 
of non-colocated receptors into the RIME formulations by pick- 
ing a reference point (e.g., the mid-point between the two recep- 
tors), and decomposing K p into a product of a scalar phase delay 
corresponding to the reference point, and a non-scalar differen- 



tial delay term: K p = KpK^. The scalar term K p 0) can then be 
commuted around the RIME to yield the FT kernel of Eq. (0, 
while Kp becomes a DIE that can be absorbed into the overall 
phase calibration (or cause instrumental V or U polarization if 
it isn't). The exact form of K ( p and K p can be derived from 
geometric considerations (or analysis of instrument optics), but 
such a derivation is extrinsic to the RIME per se. This situation 
is sim ilar to that of the T^ xy) term derived by Carozzi & Woanl 
(l2009h . and is another reason behind the multidimensional for- 
malism proposed later on in this paper. 

Note that conventional FT-based imaging algorithms also as- 
sume colocated receptors when converting visibilities to Stokes 
parameters. For example, the conventional formulae for / and U, 

I = \(Y XX + Vyy), U = i(V„ + Vy X ), 

implicitly assume that the constituent visibilities are mea- 
sured on the same baseline. Some leeway is acceptable here: 
since the measured visibilities are additionally convolved by 
the aperture illumination function (AIF), the formulae above 
still apply, as long as the degree of non-colocation is negligi- 
ble compared to the effective station size. Note also that some 
of the novel approaches of expectation-maximization im aging 
dLeshem & van der Veenl l2000t iLevanda & Lesheml Hp 1 0) for- 
mulate the imaging problem in such a way that the colocation 
requirement can probably be done away with altogether. 



2. The array correlation matrix formalism 

I will first explore the limitations of the closed-system assump- 
tion a little bit further. Consider an A A, PAF, or conventional 
closely-packed interferometer array where mutual coupling af- 
fects more than two receptors at a time. Such an array cannot 
be partitioned into pairs of receptors, with each pair forming a 
closed system. The normal 2x2 RIME of Eq. is then no 
longer valid. An alternative is to describe the response of such 
an array in terms of a single array correlation matrix (ACM, also 
called the signal v oltage covaria nce matrix), as has be en done by 
Wijnholds] d2010l) for AAs, and lWarnick et all (1201 ll) for PAFs. 
Since the ACM provides a valuable conceptual link between the 
2x2 RIME and the tensor formalism described later in this pa- 
per, I will consider it in some detail in this section. 

Let's assume an arbitrary set of n receptors (e.g. dipoles) in 
arbitrary orientation, and a single point source of radiation. If 
we represent the voltage response of the full array by the column 
vector v = (vi . . . v„) r , then we can express it (assuming linearity 
as usual) as the product of a n x 2 matrix with the source EMF 
vector e: 



v = Je = 



If all pairwise combinations of receptors are correlated, we 
end up with an n x n ACM3 V: 



' jxl 










(;) 








V Jxn 


jyn , 





V = <vv H > 



I jxl jyl \ 



V Jxn Jyn ) 



J y l Jyn 



:JBJ", 



(6) 



5 I will use boldface roman capitals, suc h as V, for mat rices other 
than 2x2. As per the conventions of Paper I (Smirnov 201 la, Sect. 1.4), 
2x2 Jones matrices are indicated by boldface italic capitals (/). 
Brightness, coherency and visibility matrices (as well as tensors, in- 
troduced later) are indicated by sans-serif capitals, e.g. B. 
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where B is the 2x2 source brightness matrix, and J is an nx2 
Jones-like matrix for the entire array. Note that for n = 2, this 
equation becomes the autocorrelation matrix given by the RIME 
of Eq. ([TJ with p = q. 

To derive the J matrix for a given observation, we need to de- 
compose it into a product of "physical" terms that we can anal- 
yse individually. As an example, let's consider only three effects: 
primary beam (PB) gain, geometric phase, and cross-talk. The J 
matrix can then be decomposed as follows: 



' qu ■■■ qin 



v = QKEe 



Ki ^ 



v q n \ ■ ■ ■ qnn ) \ K n ) 
and the full ME then becomes: 
V = QKEBE H K H Q H . 



(7) 



(8) 



The n x 2 E matrix corresponds to the PB gain, the n x n di- 
agonal K matrix corresponds to the individual phase terms (dif- 
ferent for every receptor), and the n x n Q matrix corresponds 
to the cross-talk and/or mutual coupling between the receptors. 
The equation does not include an explicit term for the complex 
receiver gains: these can be described either by a separate diag- 
onal matrix, or absorbed into Q. 

In the case of a classical array of dishes, we have n = 2m 
receptors, with each adjacent pair forming a closed system. In 
this case, Q becomes block-diagonal - that is, composed of 2 x 2 
blocks along the diagonal, equival ent to the "leakage" m atrices 
of the original RIME formulation (Ha maker et alJH996l) . K be- 
comes block-scalar (ki = k%,k% = k\, ...), and Eq. ([8]) dissolves 
into the familiar set of m(m— 1)/2 independent RIMEs of Eq. ([3]). 

Note that the ordering of terms in this equation is not entirely 
physical - in the actual signal path, the phase delay represented 
by K occurs before the beam response E. To be even more pre- 
cise, phase delay may be a combination of geometric phase that 
occurs "in space" before E, and fringe stopping that occurs "in 
the correlator" after Q. Such an ordering of effects becomes very 
awkward to describe with this matrix formalism, but will be fully 
addressed by the tensor formalism of Sect. [3] 

2.1. Image-plane effects and cross-talk 

If we now consider additional image-plane effects^, things get 
even more awkward. In the simple case, if these effects do not 
vary over the array (i.e. for a given direction, are the same along 
each line of sight to each receptor), we can replace the e vec- 
tor in Eq. (0 by Ze, where Z is a Jones matrix describing the 
image-plane effect. We can then combine the n x 2 E matrix and 
the 2 x 2 Z matrix into a single term R = EZ , which is a n x 2 
matrix describing the voltage gain and all other image-plane ef- 
fects, and define an nxn "apparent sky" matrix as B app = RBR H . 
Equation ([8]l then becomes 

V = QKB app K H Q H 

If image-plane effects do vary across receptors, then a matrix 
formalism is no longer sufficient! The expression for each recep- 
tor p must somehow incorporate its own Z p Jones matrix. We 



6 The previous papers in this series (Smirnov 2011a.b.c) also refer 
to these as direction-dependent effects (DDEs). As far as the present 
paper is concerned, the important aspect of these effects is that they 
arise before the receptor, rather than their directional-dependence per 
se. I will therefore use the alternative term image-plane effects in order 
to emphasize this. 



need to describe signal propagation along n lines of sight, and 
each propagation effect needs a 2 x 2 matrix. A full description 
of the image-plane term then needs n x 2 x 2 complex numbers. 

Another way to look at this conundrum is as follows. As long 
as each receptor pair is colocated and forms a closed system (as 
is the case for traditional interferometers), the voltage response 
of each receptor depends only on the EMF vector at its loca- 
tion. The correlations between stations p and r can then be fully 
described in terms of the EMF vectors at locations p and r. This 
allows us to write the RIME in a matrix form, as in Eq. © or d8). 
In the presence of significant cross-talk between more than two 
receptors, the voltage response of each receptor depends on the 
EMF vectors at multiple locations. In effect, the cross-talk term 
Q in Eq. <[8J "scrambles up" image plane effects between differ- 
ent receptor locations; describing this is beyond the capability of 
ordinary matrix algebra. 

In practice, receptors that are sufficiently separated to see 
any conceivable difference in image-plane effects would be 
too far apart for any mutual coupling, while today's all-digital 
designs have also eliminated most possibilities of cross-talk. 
Mathematically, this corresponds to Z p * Z r where q pr + 0, 
which means that image-plane effects can, in principle, be shoe- 
horned into the matrix formalism of Eq. ((5). This, however, does 
not make the formalism any less clumsy - we still need to de- 
scribe different image-plane effects for far-apart receptors, and 
mutual coupling for close-together ones, and the two effects to- 
gether are difficult to shoehorn into ordinary matrix multiplica- 
tion. 



2.2. The non-paraxial case 



ICarozzi & Woanl J2009) have shown that the EMF can only be 
accurately described by a 2-vector in the paraxial or nearly - 
paraxial case. For wide-field polarimetry, we must describe the 
EMF by a rank-3 column vector e = (e x , e y , e z ) T , and the sky 
brightness distribution by a 3 x 3 matrix B (3) = {eeF). The in- 
trinsic sky brightness is still given by a 2 x 2 matrix B; once an 
xyz Cartesian system is established, this maps to B (3) via a 3 x 2 
transformation matrix T (ibid., Eqs. (20) and (21)): 

B (3) _ T BT r . 



It is straightforward to incorporate this into the ACM formal- 
ism: the B term of Eqs. © is replaced by B <3) , and the dimen- 
sions of the E matrix become n x 3. 



3. A tensor formalism for the RIME 

The ACM formalism of the previous section turns out to be only 
marginally useful for the purposes of this paper. It does aid in 
understanding the effect of mutual coupling and the closed sys- 
tem assumption a little bit better, but it is much too clumsy in 
describing image-plane effects, principally because the rules of 
matrix multiplication are too rigid to represent this particular 
kind of linear transform. What we need is a more flexible scheme 
for describing arbitrary multi-linear transforms, one that can go 
beyond vectors and matrices. Fortunately, mathematicians have 
already developed just such an apparatus in the form of tensor 
algebra. In this section, I will apply this to derive a generalized 
multi-dimensional RIME. 
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3.1. Tensors and the Einstein notation: a primer 

Tensors are a large and sprawling subject, and one not particu- 
larly familiar to radio astronomers at large. Appendix lAlprovides 
a brief but formal description of the concepts required for the for- 
mulations of this paper. This is intended for the in-depth reader 
(and to provide rigorous mathematical underpinnings for what 
follows). For an executive overview, only a few basic concepts 
are sufficient: 

Tensors are a generalization of vectors and matrices. An («, m)- 
type tensor is given by an (n + m)-dimensional array of numbers, 
and written using n upper and m lower indices: e.g. ^ . 
Superscripts are indices just like subscripts, and not exponen- 
tiatiorfl! For example, a vector is typically a (l,0)-type tensor, 
denoted as x'. A matrix is a (l,l)-type tensor, denoted as AV. 
Upper and lower tensor indices are quite distinct, in that they 
determine how the components of a tensor behave under coor- 
dinate transforms. Upper indices are called contmvariant, since 
components with an upper index (such as the components of a 
vector Xi) transform reciprocally to the coordinate frames. As a 
simple example, consider a "new" coordinate frame whose basis 
vectors are scaled up by a factor of a with respect to those of 
the "old" frame. In the "new" frame, the same vector is then de- 
scribed by coordinate components that are scaled by a factor of 
aT l w.r.t. the "old" components. By contrast, for a linear form fr 
(that is, a linear function mapping vectors to scalars), the "new" 
components are scaled by a factor of a. Lower indices are thus 
said to be covariant. 

In physical terms, upper indices tend to refer to vectors, and 
lower indices to linear functions on vectors. An n x n matrix 
can be thought of as a "vector" of n linear functions on vectors, 
and thus has one upper and one lower index in tensor notation, 
and transforms both co- and contravariantly. This is manifest in 
the familiar T~'AT (or TAT -1 , depending which way the coor- 
dinate transform matrix T is defined) formula for matrix coordi- 
nate transforms. For higher-ranked tensors, the general rules for 
coordinate transforms are covered in Sect. IA.2.11 
Einstein notation (or Einstein summation) is a convention 
whereby repeated upper and lower indices in a product of ten- 
sors are implicitly summed over. For example, 

7=1 

is a way to write the matrix/vector product y = Ax in 
Einstein notation. The j index is a summation index since it is 
repeated, and the i index is a free index. Another useful con- 
vention is to use Greek letters for the summation indices. For 
example, a matrix product may be written as C^. = A^B". 
The tensor conjugate is a generalization of the Hermitian trans- 
pose. This is indicated by a bar over the symbol and a swapping 
of the upper and lower indices. For example, x, is the conjugate 
of x 1 , and A', is the conjugate of A J r 

3.2. Recasting the RIME in tensor notation 

As an exercise, let's recast the basic RIME of Eq. ([1) using ten- 
sor notation. Th is essentially repeats the derivations of Paper I 
dSmirnovll201 lal) using tensor terminology (compare to Sect. 1 
therein). 

7 This is the way they will be used in this paper from this point on, 
with the exception of small integer powers (e.g. I 2 ), where exponentia- 
tion is obviously implied. 
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For starters, we must define the underlying vector space. 
The classical Jones formalism (JF) corresponds to rank-2 vec- 
tors, i.e. the C 2 space. We can also use C 3 space instead, which 
results in a versi o n of t he Wolf formalism (WF) suggested by 
ICarozzi & Woanl d2009i) . Remarkably, both formulations look 
exactly the same in tensor notation, the only difference being 
the implicit range of the tensor indices. I'll stick to the familiar 
terminology of the JF here, but the same analysis applies to the 
WF. 

An EMF vector is then just a (l,0)-type tensor e'. Linear 
transforms of vectors (i.e. Jones matrices) correspond to (1,1)- 
type tensors, [J p ]^. (note that p is not, as yet, a tensor index here, 
but simply a station "label", which is emphasized by hiding it 
within brackets). The voltage response of station p is then 

[v P Y = [Jp]^*, 

where a is a summation index. The coherency of two voltage 
or EMF vectors is defined via the outer productQ e'ej, yielding a 
(l,l)-type tensor, i.e. a matrix: 

[Vpq]} = 2<[v p ]'[v p ] ; >, 

Combining the two equations above gives us 

[V pq ]} = 2([[J p ]^] [[Jq]^]*) = [Jpt(2<e%»[J q rJ 

And now, defining the source brightness tensor B^. as 2{e'ej), 
we arrive at 

[Vpq]} = [Jp]^[J q r;, (9) 

which is exactly the RIME of Eq. (Q}, rewritten using 
Einstein notation. Not surprisingly, it looks somewhat more 
bulky than the original - matrix multiplication, after all, is a 
more compact notation for this particular operation. 

Now, since we can commute the terms in an Einstein sum 
(as long as they take their indices with them, see Sect. IA.5.2I >. 
we can split off the two J terms into a sub-product which we'll 
designate as [J pq ]: 

[JptB^jqf] = ([JpUJq]^) B« = [Jpq] JB£ (10) 

What is this J pq ? It is a (2,2)-type tensor, corresponding to 2x 
2x2x2 = 16 numbers. Mathematically, it is the exact equivalent 
of the outer product J p ® J^, givi ng us the 4x4 form of the 
RIME, as originally formulated by Hamak er et al.l (Qj96). The 
components of the tensor given by [Jpq]^,B^ correspond exactly 
to the components of a 4-vector produced via multipli cation of 
the 4x4 matrix J p by the 4-vector SI (see Paper I. ISmirnovl 
|2011al Sect. 6.1). 

Finally, note that we've been "hiding" the p and q station 
labels inside square brackets, since they don't take part in any 
tensor operations above. Upon further consideration, this distinc- 
tion proves to be somewhat artificial. Let's treat p and q as free 
tensor indices in their own righl0. The set of all Jones matrices 
for the array can then be represented by a (2,l)-type tensor J'". 
All the visibilities measured by the array as a whole will then be 

8 This definition is actually fraught with subtleties: see Sect. IA.6TT1 
for a discussion. 

9 Strictly speaking, all tensor indices should have the same range. 
I'm implicitly invoking mixed-dimension tensors at this point. See 
Sect. lA.6^2l for a formal treatment of this issue. 
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represented by a (2,2)-type tensor M p \, 
Eq. (0 as 



and we can then rewrite 



(11) 



...which is now a single equation for all the visibilities mea- 
sured by the array, en masse (as opposed to the visibility of a 
single baseline given by Eq. ([T} or (|9]i)- Such manipulation of 
tensor indices may seem like a purely formal trick, but will in 
fact prove very useful when we consider generalized RIMEs be- 
low. 

Note that the brightness tensor B is self-conjugate (or 
Hermitian), in the sense that B^. = Bf. The visibility tensor V, 
on the other hand, is only Hermitian with respect to a permuta- 
tion of p and q: \l p '. = V*'.. 



4. Generalizing the RIME 

In this section, I will put tensor notation to work to incorporate 
image-plane effects and mutual coupling and beamforming into 
a generalized RIME hinted at in Sect. [2] This shows how the for- 
malism may be used to derive a few different forms of the RIME 
for various instrumental scenarios. Note that the resulting equa- 
tions are somewhat speculative, and not necessarily applicable to 
any particular real-life instrument. The point of the exercise is to 
demonstrate the flexibility of the formalism in deriving RIMEs 
that go beyond the capability of the Jones formalism. 

First, let's set up some indexing conventions. I'll use i, j, k, ... 
for free indices that run from 1 to N = 2 (or 3, see below), i.e. 
for those that refer to EMF components, or voltages on paired re- 
ceptors, and a,f3,y, ... for summation indices in the same range. I 
shall refer to such indices as 2-indices (or 3-indices). For free in- 
dices that refer to stations or disparate receptors (and run from 1 
to AO, I'll use p, q, r, s, and for the corresponding summation 
indices, <x, t, v, </>,... I shall refer to these as station indices. 

Consider again the N arbitrary receptors of Sect. |2] observ- 
ing a single source. The source EMF is given by the tensor e' . 
All effects between the source and the receptor, up to and not 
including the voltage gain, can be described by a (2,l)-type ten- 
sor, Z P j' . This implies that they are different for each receptor p. 
The PB response of the receptor can be described by a (l,l)-type 
tensor, E''. Finally, the geometric phase delay of each receptor is 
a (l,0)-type tensor, K p . 

Let's take this in small steps. The EMF field arriving at each 
receptor p is given by 



e' pl = K"Z p Je a 



(12) 



(remembering that we implicitly sum over a here). If we con- 
sider just one receptor in isolation, we can re-write the equation 
for one specific value of p. This corresponds to the familiar ma- 
trix/vector product: 

e" = KZy 7 , or e' = KZe, 



Now, if we put the receptor index p back in the equations, 
we arrive at the tensor expression: 



V ' P = E p K p Zfe a 



(13) 



We're now summing over a when applying image-plane ef- 
fects, and over /3 when applying the PB response. 

Finally, cross-talk and/or mutual coupling scrambles the re- 
ceptor voltages. If v' p is the "ideal" voltage vector without cross- 
talk, then we need to multiply it by an n x n matrix (i.e. a (1,1)- 
type tensor) to apply cross-talk: 



v p = Cfr 



(14) 



The final equation for the voltage response of the array is 



then: 



Q^E^Zfe". 



(15) 



We're now summing over <x (which ranges over all recep- 
tors), a and /?. 

The visibility tensor V^, containing all the pairwise correla- 
tions between the receptors, can then be computed as 2{v p v q ). 
Applying Eq. (fT5T l. this becomes 

v p = 2([c£e-k-z-V1 [q?e;k t z7/[* ) 

This uses a different set of summation indices within each 
pair of brackets, since each sum is computed independently. 
Doing the conjugation and rearranging the terms around, we ar- 
rive at: 



V p = C&E^Z? B«Z^K T E?Q^. 



(16) 



This is the tensor form of a RIME for our hypothetical ar- 
ray. Note that structurally it is quite similar to the ACM form of 
Sect. [2] (e.g. Eq. ©), but with one principal difference: the (2- 
l)-type Z tensor describes receptor-specific effects, which can- 
not be expressed via a matrix multiplication. Note also that the 
other awkwardness encountered in Sect. |2] namely the difficulty 
of putting geometric phase delay and fringe stopping into their 
proper places in the equation, is also elegantly addressed by the 
tensor formalism. Additional phase delays tensors can be in- 
serted at any point of the equation. 

4.1. Wolf vs. Jones formalisms 

Equation (fTol i generalizes both the classical Jones formalism 
(JF), and the three-component Wolf formalism (WF). The JF 
is constructed on top of a two-dimensional vector space: EMF 
vectors have two components, the indices a, ft, ... range from 1 
to 2, and the B tensor is the usual 2x2 brightness matrix. The 
WF corresponds to a three-dimensional vector space, with the B 
tensor becoming a 3 x 3 matrix. 

Recall (Sect. I2.21 i that the 3x3 brightness matrix is derived 
from the 2x2 brightness matrix via a 3 x2 transformation matrix 
T. This derivation can also be expressed as an Einstein sum: 



where Z is the Jones matrix describing the image -plane ef- [B (3) ]'' = V a [B (2) ]"T fi 
feet for this particular receptor, and K is the geometric phase 
delay. The receptor translates the EMF vector e" into a scalar 
voltage V . This is done via its PB response tensor, E,, which is 
just a row vector: 



v' = Epe' p . 



where T is the tensor equivalent of the transformation matrix. 

In subsequent formulations, I will make no distinction be- 
tween the JF and the WF unless necessary, with the implicit un- 
derstanding that the appropriate indices range from 1 to 2 or 3, 
depending on which version of the formalism is needed. 
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4.2. Decomposing the J matrix 

If we isolate the left-hand sub-product in Eq. (O, 

and track down the free indices in this tensor expression - p 
and a - we can see that the product is a (1,1) tensor, J p . We can 
then rewrite the equation in a more compact form: 

% = J£B$ (17) 

Not surprisingly, this is just the ACM RIME of Eq. © 
rewritten in Einstein notation. In hindsight, this shows how we 
can break down the full-array response matrix J into a tensor 
product of physically meaningful terms. Note how this parallels 
the situation of the 2x2 form RIME: even though each 2x2 vis- 
ibility matrix, in principle, depends on only two Jones matrices 
(Eq. ([TJ), in real-life applications we almost always need to form 
them up from a chain of several different Jones terms, as in e.g. 
the onion form (Eq. Q). What the tensor formulation offers is 
simply a more capable means of computing the response matri- 
ces (more capable than a matrix product, that is) from individual 
propagation tensors. 

4.3. Characterizing propagation tensors 

Since it the original formulation of the matrix RIME, a number 
of standard types of Jones matrices have seen widespread use. 
The construction of Jones matrices actually follows fairly simple 
rules (even if their behaviour as a function of time, frequency and 
direction may be quite complicated). A number of similar rules 
may be proposed for propagation tensors: 

- A tensor that translates the EMF vector into another vector 
(e.g., Faraday rotation) must necessarily have an upper and a 
lower 2-index. 

- A tensor that translates both components of the EMF field 
equally (i.e. a scalar operation such as phase delay) does not 
need any -indices at all. 

- A tensor transforming the EMF vector into a scalar (e.g., the 
voltage response of a receptor) must have a lower 2-index. 

- A tensor for an effect that is different across receptors must 
have a station index. 

- A tensor for an effect that maps per-receptor quantities onto 
per-receptor quantities must have two station indices (upper 
and lower). 

Some examples of applying these rules: 

- Faraday rotation translates vectors, so it must have an upper 
and a lower 2-index. If different across stations and/or recep- 
tors, it must also have a station index. This suggests that the 
tensor looks like F'" (or F' .). 

- Phase delay operates on the EMF vector as a scalar. It is 
different across receptors, hence its tensor looks like W. 

- PB response translates the EMF vector into a scalar voltage, 
and must therefore have one lower 2-index. It is usually dif- 
ferent across stations and/or receptors, hence its tensor looks 

like E p . 

i 

- Cross-talk or mutual coupling translates receptor voltages 
into receptor voltages, so it needs two station indices. Its ten- 
sor looks like Q^. 



- If mutual coupling needs to be expressed in terms of the 
EMF field at each receptor instead, then it may need two 
2-indices and two station indices, giving a (2,2)-type tensor, 
Q p '.. Alternatively, this may be combined with the PB re- 
sponse tensor E, giving the voltage response of each recep- 
tor as a function of the EMF vector at all the other receptors. 
This would be a (2,l)-tensor, E pl . 

4.4. Describing mutual coupling 

Equations $15[ and ( TTol l were derived under the perhaps simplis- 
tic assumption that the effec{3 of mutual coupling can be fully 
described via cross-talk between the receptor voltages. That is, 
the collection of EMF vectors at receptor's location was de- 
scribed by a (2,0)-type tensor, e 1 " (Eq. (fT2l ). then converted into 
nominal receptor voltages by the PB tensor E? (Eq. (13[), and 
then converted into actual voltages via a (l,l)-type cross-talk 
tensor Q p (Eq. Gil). 

The underlying assumption here is that each receptor's actual 
voltage can be derived from the nominal voltages alone. To see 
why this may be simplistic, consider a hypothetical array of of 
n identical dipoles in the same orientation, parallel to the x axis. 
Nominally, the dipoles are then only sensitive to the x compo- 
nent of the EMF, which, in terms of the PB tensor E, means that 
Ef = for all p. Consequently, the actual voltages v p given by 
this model will only depend on the x component of the EMF. If 
mutual coupling causes any dipole to be sensitive to the y com- 
ponent of the EMF seen at another dipole, this results in a con- 
tamination of the measured signal that cannot be described by 
this voltage-only cross-talk model. 

A more general approach is to describe the voltage response 
of each receptor as a function of the EMF at all the receptor lo- 
cations, rather than the nominal receptor voltages. This requires 
a (l,2)-type tensor: 

V P = E P ><rp 
crfi 

This Efj tensor (consisting of nxnx2 complex numbers) then 
describes the PB response and the mutual coupling together. The 
simpler cross-talk-only model corresponds to E p . being decom- 
posable into a product of two (l,l)-type tensors (n x n + n x 2 
complex numbers), as E p q . = Q^E''. This model will perhaps 
prove to be sufficient in real-life applications, but it is illustrative 
how simple it is to extend the formalism to the more complex 
case. 

4.5. Describing beamforming 

In modern PAF and AA designs, receptors are grouped into sta- 
tions, and operated in beamformed mode - that is, groups of re- 
ceptor voltages are added up with complex weights to form one 
or more compound beams. The output of a station is then a sin- 
gle complex voltage (strictly speaking, a single complex num- 
ber, since beamforming is usually done after A/D conversion) 
per each compound beam, rather than n individual receptor volt- 
ages. 

Beamforming may also be described in terms of the ten- 
sor RIME. Let's assume N stations, each being an array of 
n\ , «2, «2V receptors. The voltage vector v a registered at station 
p (where a = l...n p ) can be described by Eq. (TOT l. In addition, 

10 As opposed to the mechanism, which is considerably more com- 
plex, and outside the scope of this work. 
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the voltages are subject to per-receptor complex gains (which 
we had quietly ignored up until now), which corresponds to an- 
other term, g a . The output of one beamformer, /, is computed by 
multiplying this by a co vector of weights, w a : 



f = w a v a 



arxa co'urcr-rO'B a 

? Q, r E B K Z/e . 



(18) 



In a typical application, the beamformer outputs are corre- 
lated across stations. In this context, it is useful to derive a com- 
pound beam tensor, which would allow us to treat a whole station 
as a single receptor. To do this, we must assume that image plane 
effects are the same for all receptors in a station (Zj 8 = Z„). 
Furthermore, we need to decompose the phase term K°" into a 
common "station phase" K, and a per-receptor differential de- 
lay (<SK)°", so that K a = K(<5K)°". The latter can be derived in a 
straightforward way from the station (or dish) geometry. We can 
then collapse some summation indices: 



/ = vv^Q^K^KfZ^ = ( Wa g a Q a cr E%(6Kr)KZ l ' a e a 



S«KZ 



(19) 



This expression is quite similar to Eq. (U~5V Now, if for sta- 
tion p the compound beam tensor is given by S^, then a complete 
RIME for an interferometer composed of beamformed stations 



(20) 



which is very similar to the RIME of Eq. ( [ToT i. except that 
the PB tensor E has been replaced by the station beam tensor S, 
and there's no cross-talk between stations. If each station pro- 
duces a pair of compound beams (e.g., for the same pointing but 
sensitive to different polarizations), then this equation reduces to 
the classical matrix RIME, where the E-iones term is given by 
a tensor product. In principle, we could also combine Eqs. (IT8l 
and ( f20b into one long equation describing both beamforming 
and station-to-station correlation. 

This shows that a compound beam tensor (S 1 .) can always 
be derived from the beamformer weights, receptor gains, mu- 
tual coupling terms, element PBs, and station geometry, under 
the assumption that image-plane effects are the same across the 
aperture of the station. By itself this fact is not particularly new 
or surprising, but its useful to see how the tensor formalism al- 
lows it to be formulated as an integral part of the RIME. 

As for the image-plane effects assumption, it is probably safe 
for PAFs and small AAs, but perhaps not so for large AAs. If 
the assumption does not hold, we're left with an extra cr index 
in Eq. ( fT9l . and may no longer factor out an independent com- 
pound beam tensor S 1 .. This situation cannot be described by the 
Jones formalism at all, but is easily accommodated by the tensor 
RIME. 

4.6. A classical dual-pol RIME 

Equation ( [ToT i describes all correlations in an interferometer in a 
single (l,l)-type tensor (matrix). Contrast this to Eq. (fTTT i. which 
does the same via a (2,2)-type tensor, by grouping pairs of re- 
ceptors per station. Since the latter is a more familiar form in 
radio interferometry, it may be helpful to recast Eq. ([16} in the 
same manner. First, we mechanically replace each receptor index 
(p, q, cr, t) by pairs of indices (pi, qj, crv, T(p), corresponding to 
a station and a receptor within the station: 



V-=Q 



3 pa -76 



Next, we assume colocation (since the per-station receptors 
are, presumably, colocated) and simplify some tensors. In par- 
ticular, K and Z can lose their receptor indices: 

\/P' — nP' ^.o-viso-riTPciayS j> r\ T >l> 

qj ~ jS K z « B i Z r r K rE T(A U 9j . 

This equation cannot as yet be expressed via the Jones for- 
malism, since for any p, q, the sum on the right-hand side con- 
tains terms for other stations (cr, r). To get to a Jones-equivalent 
formalism, we need to remember the closed system assumption, 
i.e. no cross-talk or mutual coupling between stations (Sect. fTTZl i. 
This corresponds to Q'* = for p + q. Q is then equivalent to a 
tensor of one rank lower, with one station index eliminated: 



(21) 



For any p, q, this is now exactly equivalent to a Jones- 
formalism RIME of the form: 

V pq = Q p E p K p Z p ZZ»K H q E H p QP, 

where K p is a scalar, and the rest are full 2x2 matrices. The Q p 
term here incorporates the traditional G- Jones (receiver gains) 
and D-Jones (polarization leakage). Finally, if we assume no 
polarization leakage (i.e. no cross-talk between receptors), then 
Q pl = for i + j, and we can lose another index: 



Vj = CPE^Zf B^K^Q 



qj^qj- 



(22) 



In the Jones formalism, this is equivalent to Q p being a diag- 
onal matrix for any given p. 

4. 7. A full-sky tensor RIME 

By analogy with the matrix RIME (see Paper L lSmirno vll20TTaL 
Sect. 3), we can extend the tensor formalism to the full-sky case. 
This does not lead to any new insights at present, but is given 
here for the sake of completeness. 

When observing a real sky, each receptor is exposed to the 
superposition of the EMFs arriving from all possible directions. 
Let's begin with Eq. ( fToT ). and assume the Q term is a DIE, and 
the rest are DDEs. For a full-sky RIME, we need to integrate the 
equation over all directions as 



E-K-ZfB£Z* y K T E?dQ 



q;. 



which, projected into Im coordinates, gives us: 



ff E W rI B^K T E^ 

V lm 



(23) 



Let's isolate a few tensor sub-products and collapse indices. 
First, we can introduce an "apparent sky" tensor: 

q ~ tpt-a °5 Z qy^q 

Note that this is an n x n matrix. Physically, B^(/, m) corre- 
sponds to the coherency "seen" by receptors p and q as a func- 
tion of direction. Next, we introduce a phase tensor: 

K = K%, 
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which another n x n matrix. Note that we reuse the letter K 
here, but there shouldn't be any confusion with with the "other" 
K p , since the tensor type is different. Each component of this 
tensor is given by 

K£ = exp \-2ni(u pq l + v pq m + w pq (n - 1))] . 
Equation ( 1231 then becomes simply: 



FT rr rr 6lTl 

j] B ' K ° — 



(24) 



where the integral then corresponds to n x n element-by- 
element Fou rier transforms, a nd all the DDE-re lated discussions 
of Papers I (ISmirnovll201 lal Sect. 3) and II (Smir novll2011bl 
Sect. 2) apply. 

4.7.1. The apparent coherency tensor 

If we designate the value of the integral in Eq. (l23l by the appar- 
ent coherency tensor X^, we have arrive at the simple equation 

y p q = c£x?q;. 

which ties together the observed correlation matrix, V^, and 
the apparent coherency tensor X!jT. The physical meaning of each 
element of X^T is, obviously, the apparent coherency observed by 
receptor pair cr and t. The cross-talk term Q "scrambles up" the 
apparent coherencies among all receptors. Note that this similar 
to the coherency matrix X (or X n „) used in the classical formula- 
tion of the matrix RIME (Hamak eret al.lll996t ISmirnovll201 lal 
Sect. 1.7). 



4.8. Coordinate transforms, or whither tensors? 

Einstein summation by itself is a powerful notational conve- 
nience for expressing linear combinations of multidimensional 
arrays, one that can be gainfully employed without regard to the 
finer details of tensors. The formulations of this paper may in 
fact be read in just such a manner, especially as they do not seem 
to make explicit use of that many tensor-specific constructs. It is 
then a fair question whether we need to invoke the deeper con- 
cepts of tensor algebra at all. 

There is one tensor property, however, that is crucial within 
the context of the RIME, and that is behaviour under coordinate 
transforms. In the formulations above, I did not specify a coor- 
dinate system. As in the case of the matrix RIME, the equations 
hold under change of coordinate frame, but their components 
must be transformed following certain rules. The general rules 
for tensors are given in Sect. IA.4I (Eq. (IA.1V ): for the mixed- 
dimension tensors employed in this paper, coordinate transforms 
only affect the core C 2 (or C 3 ) vector space, and do not ap- 
ply to station indices (the latter are said to be invariant, see 
Sect. lAA2b . 

As long as we know that something is a tensor of a certain 
type, we have a clear rule for coordinate transformations given 
by Eq. ( IA. lb . However, Einstein notation can be employed to 
form up arbitrary expressions, which are not necessarily proper 
tensors unless the rigorous rules of tensor algebra are followed 
(see Appendix lAl). This argues against a merely mechanical use 
of Einstein summation, and makes it worthwhile to maintain the 
mathematical rigour that enables us to clearly follow whether 
something is a tensor or not. 



5. Implementation aspects 

Superficially, evaluation of Einstein sums seems straightforward 
to implement in software, since it is just a series of nested loops. 
Upon closer examination, it turns out to raise some non-trivial 
performance and optimization issues, which I'll look at in this 
section. 



5.1. A general formula for FLOP counts 

Consider an Einstein sum that is a product of k tensors (over a D- 
dimensional vector space), with n free and m summation indices. 
I'll call this an (n, m, k)-calibre product. Let's count the number 
of floating-point operations (ops for short) required to compute 
the result. The resulting tensor has D" components. Each compo- 
nent is a sum of D m individual products (thus D m - 1 additions); 
each product incurs k — 1 multiplications. The total op count is 
thus 

N$(n, m, k) = D"((D m (k - 1) + D m - 1)) = D"(D m k - 1). (25) 

For mixed-dimensionality tensors, a similar formula may be 
derived by replacing D" and D m with D" 1 D" 2 2 and D'"' D™ 2 , where 
the two dimensions are D\ and D2, and the index counts per each 
dimensionality are numbered accordingly. 

Consider a few familiar examples: 

- Multiplication of 2 x 2 matrices, A^BJ: N^ s (2, 1,2)= 12. 

- Multiplication of 4 x 4 matrices: N^ s (2, 1,2) = 112. 

- Outer product of 2 x 2 matrices, A^B*: N^ s (4, 0, 2) = 16. 

- Multiplication of a 4 x 4 matrix by a 4- vector, A' a x a : 
< ) s (l,l,2) = 28. 

- The equivalent operation (see Eq. ([Toi l) of multiplying 
a (2,2)-type tensor (with D = 2) by a matrix, J^,B?: 
<k2,2,2) = 28. 



5.2. Partitioning an Einstein sum 

Mathematically equivalent formulations can often incur sig- 
nificantly differe nt numbers of ops. For example, in Paper I 
dSmirnovll201 lal Sect. 6.1), I already noted that a straightfor- 
ward implementation of a 2 x 2 RIME is cheaper than the same 
equation in 4 x 4 form, although the specific op counts given 
therein are in errotO 

Let's consider a 2 x 2 RIME of the form of Eq. (f3j), with 
two sets of Jones terms, which we'll designate as D and E. We 
then have the following fully-equivalent formulations in 2 x 2 
and 4x4 form: 



y pq = DpEpBEqDg, (26) 
v pq = (D P 9D*)(E P »E*W- (27) 
while in tensor notation the same equation can be formulated 



[v pq ];. = [D p ]^[E p ]« ? B«'[E q ]^[D q r; 



(28) 



11 Specifically, Paper I claims 128 ops per Jones term in the 4x4 for- 
malism: 1 12 to multiply two 4x4 matrices, and another 16 for the outer 
product. These numbers are correct per se. However, a 4 x 4 RIME may 
in fact be evaluated in a more economical order, namely as a series of 
multiplications of a 4- vector by a 4 x 4 matrix. As seen above, this costs 
28 ops per each matrix- vector product, plus 16 for the outer product, for 
a total of only 44 ops per Jones term. 
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The cost of this Einstein sum is A^pi(2,4,5) = 316 ops. In 
comparison, the 2x2 form incurs 4 matrixproducts, for a total 
of only 48 ops, while the 4x4 form incuro two outer products 
and two 4x4 matrix/vector products, for a total of 88. For longer 
expressions with more Jones terms (i.e. larger m), brute-force 
Einstein summation does progressively worse. 

It is easy to see the source of this inefficiency. In the inner- 
most loop (say, over j), only the rightmost term [D q ]^ 2 is chang- 
ing, so it is wasteful to repeatedly take the product of the other 
four terms at each iteration. We can trim some of this waste by 
computing things in a slightly more elaborate order. Let's split 
up the computation as 

([D p ]^ 2 [E p ]gB;;[E q ]^) [D q ]f, (29) 

and compute AL first (costing N^ s (2, 3,4) = 124 ops), fol- 
lowed by AjjDq]^ 2 (costing 2V^ 3 S (2, 1,2) = 12 ops). This is an 
improvement, but we don't have to stop here: a similar split can 
be applied to in turn, and so on, ultimately yielding the fol- 
lowing sequence of operations: 

((([D p ]^[E p ]«3)B;;)[E q ]J)[D q ^) (30) 

But this is just a sequence of 2x2 matrixproducts, i.e. exactly 
the same computations that occur in the 2x2 formulation! And 
on the other hand, as already intimated by Eq. (TTOh the 4x4 form 
is equivalent to a different partitioning of the same expression, 
namely 

([Dp4[D q ]*)(([E p ]^[E q ]^)B;;).) (31) 

The crucial insight here is that different partitionings of the 
computation in Eq. ( l28l incur different numbers of ops. 

Let's look at what happens to the calibres during partition- 
ing. Consider a partition of an (n, m, fc)-calibre product into two 
steps. The first step computes an (n' ,m' ,k')-ca\\bm sub-product 
(for example, in Eq. j29l , the initial calibre is (2,4,5), and the 
sub-product for has calibre (2, 3, 4)). At the second step, the 
result of this is multiplied with the remaining terms, resulting in 
an expression of calibre (n, m - m! , k- k' + 1) (in Eq. ( 1291 , this 
is Ai DP 2 , with a calibre of (2, 1,2)). The calibres are strictly re- 
lated: each of the k terms goes to either one sub-product or the 
other, but we incur one extra term (A) in the partitioning, hence 
we have k — k' + 1 terms at the second step. The summation in- 
dices are also partitioned between the steps, hence m—m' are left 
for the second step. As for the free indices, their number n' may 
actually temporarily increase (as in the case of Eq. (l3"Tt . where 
sub-products have the calibre (4,0,2)). It is straightforward to 
show that if it does not increase, then 

N$£(n, rri, k') + N^l(n, m-m ,k-k' + 1) < N%£(n, m, k), 

so as long as n' < n, partitioning always reduces the total number 
of ops. In essence, this happens because in the total op counts, a 
product is replaced by a sum: D m ' + D m_m ' < D"'. 

From this it follows that the 2x2 form of the RIME is, 
in a sense, optimal. The partitioning given by Eq. (l30l > re- 
duces an A^ps (2, m, k) operation into m operations of N ( Q ^}(2, 1 , 2) 

12 Not counting the SI operation: we assume the coherency vector is 
a given, since it's the equivalent of B. 



each; the latter represents the smallest possible non-trivial sub- 
product. (Note that the rank of any sub-product of Eq. (l28l can 
only be an even number, since all the terms have an even rank. 
The minimum non-trivial n' is therefore 2.) 

5.3. Dependence optimization 

The partitioning given by Eq. (1301 allows for a few alternatives, 
corresponding to different order of matrix multiplication. While 
seemingly equivalent, they may in fact represent a huge oppor- 
tunity for optimization, when we consider that in real life, the 
equation needs to be evaluated millions to billions of times, for 
all antenna pairs, baselines, time and frequency bins. Not all the 
terms in the equation have the same time and frequency depen- 
dence: some may be constant, some may be functions of time 
only or frequency only, some may change a lot slower than oth- 
ers - in other words, some may have limited dependence. For 
example, in the "onion" of Eq. d26i >. if B and E p have a limited 
dependence, say on frequency only, then the inner part of the 
"onion" can be evaluated in a loop over frequency channels, but 
not over timeslots. The resulting savings in ops can be enormous. 

This was already demo nstrated in the MeqTrees system 
dNoordam & Smirnovll201Cil) . which takes advantage of limited 
dependence on-the-fly. A RIME like Eq. ( [261 ) is represented by 
a tree, which typically corresponds to the following order of op- 
erations: 

y pq = D p (E p BE»)D», 

with an outermost loop being over the layers of the "onion", 
and inner loops over times and frequencies. When the operands 
have limited dependence (as e.g. for the E p BE^ product), the 
MeqTrees computational engine automatically skips unneces- 
sary inner loops. Thus the amount of loops is minimized on the 
"inside" of the equation, and grows as we move "out" through 
the layers and add terms with more dependence. I call this de- 
pendence optimization. 

Among the alternative partitionings of Eq. (l30l . the one that 
computes the sub-products with the least dependence first can 
benefit from dependence optimization the most. 

5.4. Commutation optimization 

In a 2 x 2 matrix RIME, dependence optimization works best 
if the terms with the least dependence are placed on the inside 
of the equation - if limited dependence happens to apply to D p 
(on the outside) and not E p (on the inside), dependence opti- 
mization can't reduce any inner loops at all. Unfortunately, one 
cannot simply change the order of the matrices willy-nilly, since 
they don't generally commute. However, when dealing with spe- 
cific kinds of matrices that do commute, we can do some op- 
timization by shuffling them around. Real-life RIMEs tend to 
be full of matrices with limited commutation propert ies, such 
as scal ar, diagonal and rotation matrices (see Paper I, ISmirnovl 
1201 lal Sect. 1.6). 

In tensor notation, AJ^BJ commute if the summation indices 

can be swapped around: A[ ? B" = AJB^. Some of the commuta- 
tion rules of 2 x 2 matrices discussed in Paper I (ibid.) do map 
to tensor form easily. For example, a diagonal matrix is a tensor 
with the property A^ = for i + j. If A and B are both diagonal, 

then AJjBJ is only non-zero for i = j, and commutation is ob- 
vious. Other commutation rules are more opaque: rotation ma- 
trices are known to commute among themselves, but this does 
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not follow from tensor notation at all. Therefore, opportunities 
for commutation optimization of a tensor expression may not be 
detectable until we recast it into matrix form. 

5.5. Towards automatic optimization 

For the more complicated expressions of e.g. Eq. ([To! or (l2(Tt . 
different partitionings may prove to be optimal. The previous 
sections show how to do such analysis formally. In fact, one 
could conceive of an algorithm that, given a a tensor expression 
in Einstein form, searches for an optimal partitioning automati- 
cally (with dependences taken into account). 

It is also possible that alternative partitionings may prove to 
be more or less amenable to parallelization and/or implementa- 
tion on GPUs. The tensor formalism may prove to be a valuable 
tool in this area. Recasting a series of tensor operations (matrix 
products, etc.) as a single Einstein sum, such as that in Eq. d28l i. 
shows the computation in its "flattest" (if relatively inefficient) 
form, which can then be repartitioned to yield equivalent but 
more efficient formulations. 



6. Conclusions 

The 2x2 matrix RIME, having proven itself as a very capa- 
ble tool for describing classical radio interferometry, is showing 
its limitations when confronted with PAFs, AAs, and wide-field 
polarimetry. This is due to a number of implicit assumptions un- 
derlying the formalism (plane-polarized, dual, colocated recep- 
tors, closed systems) that have been explicated in this paper. The 
RIME may be rewritten in an array correlation matrix form that 
makes these assumptions clearer, but this struggles to combine 
image-plane effects and mutual coupling in the same equation. 

A more general formalism based on tensors and Einstein no- 
tation is proposed. This reduces to the 2x2 (and 4x4) forms of 
the RIME under the explicated assumptions. The tensor formal- 
ism can be used to step outside the bounds of these assumptions, 
and can accommodate regimes not readily described in 2 x 2 
matrix form. Some examples of the latter are: 

Coupling between closely-packed stations cannot be described 
in the basic 2x2 form (where a Jones chain correspond- 
ing to each signal path is used) without additional extrinsic 
complexity, such as combining the 2x2 equations into some 
kind of larger equation. The tensor formalism describes this 
regime with a single equation. 

Beamforming can only be accommodated in the 2x2 form by 
using separate equations to derive the effective Jones matrix 
of a beamformer. The tensor formalism combines beamform- 
ing and correlation into the sa me equation. 

The Wolf formalism proposed by Carozzi &Woanl (f2009) uses 
3x3 matrices to describe polarimetry in the wide-field 
regime. Again, this can only be mapped to the 2x2 form 
by using external equations to derive special Jones matrices. 
A tensor formalism naturally incorporates the 3-component 
description, and can be used to combine it with the regimes 
above. 

In practice, tensor equations may be implemented via alter- 
native formulations that are mathematically equivalent, but have 
significantly different computing costs (the 2 x 2 vs. 4 x 4 RIMEs 
being but one example). Computing costs may be optimized by a 
repartitioning of the calculations, and some formal methods for 
analysing this have been proposed in this paper. 



I do not propose to completely supplant the 2x2 RIME. 
Where applicable (that is, for the majority of current radio inter- 
ferometric observations), the latter ought to remain the formal- 
ism of choice, both for its conceptual simplicity, and its compu- 
tational efficiency. Even in these cases, the tensor formalism can 
be of value as both a rigorous theoretical tool for analysing the 
limits of the Jones formalism's applicability, and a practical tool 
for deriving certain specialized Jones matrices. 

Acknowledgements. This effort/activity is supported by the European 
Community Framework Programme 7, Advanced Radio Astronomy in Europe, 
grant agreement no.: 227290. 

Appendix A: Elements of tensor algebra 

This appendix gives a primer on elements of tensor theory rele- 
vant to the present paper. Most of this is just a summary of ex- 
isting mathematical theory; the only new results are presented in 
Sect. IA.6I where I formally map the RIME into tensor algebra. 
More details on tensor theory can be found in any num ber of 
textbo oks, for example ISvnge & Schildl (119781) and ISimmon ds 
(11994 . 

As a preliminary remark, one should keep in mind that there 
are, broadly, two complementary ways of thinking about lin- 
ear and tensor algebra. The coordinate approach defines vec- 
tors, matrices, tensors, etc., in terms of their coordinate com- 
ponents in a particular coordinate system, and postulates some- 
what mechanistic rules for transforming these components under 
change of coordinate frames. The intrinsic (or abstract) approach 
defines these objects as abstract entities (namely, various kinds 
of linear functions) that exist independently of coordinate sys- 
tems, and derives rules for coordinate manipulation as a result. 
For example, in the coordinate approach, a matrix is defined as 
annxm array of numbers. In the intrinsic approach, it is a linear 
function mapping rank-ra vectors to rank-n vectors. 

Historically, the coordinate approach was developed first, 
and favours applications of the theory (which is why it is preva- 
lent in e.g. physics and engineering). The intrinsic approach is 
favoured by theoretical mathematicians, since it has proven to 
be a more powerful way of extending the theory. When apply- 
ing the theory to a new field (e.g. to the RIME), the mechanis- 
tic rules may be a necessity (especially when it comes to soft- 
ware implementations), but it is critically important to maintain 
conceptual links to the intrinsic approach, since that is the only 
way to verify that the application remains mathematically sound. 
This appendix therefore tries to explain things in terms of both 
approaches. 

A.1. Einstein notation 

Einstein notation uses upper and lower indices to denote com- 
ponents of multidimensional entities. For example, x' refers to 
the z'-th component of the vector x (rather than to x to the power 
of /!), and y, refers to the z'-th component of the covector (see 
below) y. Under the closely-related abstract index notation, x' 
may be used to refer to x itself, with the index i only serving to 
indicate that x is a one-dimensional object. Whether x' refers to 
the whole vector or to its z'-th component is usually obvious from 
context (i.e. from whether a specific value for z has been implied 
or not). 

In general, upper indices are associated with contravariant 
components (i.e., those that transform contravariantly), while 
lower indices refer to covariant ones. The following section will 
define these concepts in detail. 
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Einstein summation is a convention whereby the same index 
appearing in both an upper and lower position implies summa- 
tion. That is, 



x y, 



7 = XXv<' 



N 

a\ = y ai 



!=1 



An index that is repeated in the same position (e.g. ;c,y,) does 
not imply summatiorQ I shall be using the summation conven- 
tion from this point on, unless otherwise stated. 

A.2. Vectors, covectors, co- and contravariancy 

A vector can be thought of (in the coordinate approach) as a 
list of N scalars drawn from a scalar field F (e.g., the field of 
complex numbers, C). The set of all such vectors forms up the 
vector space V = F". For example, the Jones formalism is based 
on the vector space C 2 , while the Wolf formalism is based on C 3 . 

The intrinsic definition of a vector is straightforward but 
somewhat laborious, and can be looked up in any linear alge- 
bra text, so I will not reproduce it here. Intuitively, this is just 
the familiar concept of a length and direction in A^-dimensional 
space. It is important to distinguish the vector as an abstract ge- 
ometrical entity, existing independently of coordinate systems, 
from the list of scalars making up its representation in a specific 
coordinate frame. The terminology is also unhelpfully ambigu- 
ous; I will try to use vector by itself for the abstract entity, and 
column vector, row vector or components when referring to its 
coordinates. 

A coordinate frame in vector space V is defined by a set of 
linearly independent basis vectors e\, ...,e^. Given a coordinate 
frame, any vector x can be expressed as a linear combination of 
the basis vectors: x = x'e, (using Einstein notation). Each com- 
ponent of the vector, x', is a scalar drawn from F. In matrix no- 
tation, the representation of x is commonly written as a column 
vector: 



A covector (or a dual vector) f represents a linear func- 
tion from V to F, i.e. a linear function f(x) that maps vectors 
to scalars (in other words, covectors operate on vectors). This 
can also be written as / : V h-> F. The set of all covectors forms 
the dual space of V, commonly designated as V*. In a specific 
coordinate frame, any covector may be represented by a set of N 
scalar components f, so that the operation fix) becomes a sim- 
ple sum: f(x) = x'fj. In matrix notation, covectors are written as 
row vectors: 

/ = (/i-./iv), 

and the operation f(x) is then just the matrix product of a 
row vector and a column vector, fx. Note that this definition is 
completely symmetric: we could as well have postulated a cov- 
ector space, and defined the vector as a linear function mapping 
covectors onto scalars. 



13 This point is occasionally ignored in the literature, i.e. one will see 
implying summation over i as well. This is mathematically sloppy 
from a purist's point of view. 



A.2.1. Coordinate transforms 

In the coordinate approach, both vectors and covectors are repre- 
sented by N scalars: the crucial difference is in how these num- 
bers change under coordinate transforms. Consider two sets of 
basis vectors, E = {c,} and E' = {ej}. Each vector of the basis 
E' can be represented by a linear combination of the basis E, as 
e'j = a'jCj. The components a'j form the transformation matrix A, 
which is an NxN, invertible matrix. The change of basis can also 
be written as a matrix-vector product, by laying out the symbols 
for the basis vectors in a column, and formally following the 
rules of matrix multiplication: 



,1 ^ 



Let x and x' be two column vectors representing the same 
vector in basis E and E' , respectively; let / and /' be two row 
vectors representing a covector. 

The crucial bit is this: in order for x and x' to represent the 
same vector in both coordinate systems, their components must 
transform contravariantly, that is, as 



x' =A~ x x, 



X — OjX J 



(in matrix or Einstein notation, respectively), where a'j are the 

components of A . In other words, the components of a vec- 
tor transform in an opposite way to the basi^3 hence con- 
fravariantly. 

On the other hand, in order for / and /' to represent the same 
covector (i.e. the same linear functional), their components must 
transform covariantly: 

f'=fA, or ./;' - a J, 

Vectors and linear functions (or covectors) are the two fun- 
damental ingredients of the RIME. 

A3. Vector products and metrics 
A.3.1 . Inner product 

An inner product on the vector space R N or is a function 
that maps two vectors onto a scalar. It is commonly designated 
as (x,y) = c. Any function that is (1) linear, (2) conjugate- 
symmetric ((x,y) = (y,x)*; for vector spaces over R this is 
simply symmetric), and (3) positive-definite ((x, x) > for all 
x + 0) can be adopted as an inner product. 
The dot product on Euclidean space R N 



xy 



is an example of an inner product. In fact, since this paper (and 
other RIME-related literature) already uses angle brackets to de- 
note averaging in time and frequency, I shall instead use the dot 
notation for inner products in general. 

In matrix notation, the general form of an inner product on 
C N spaces is x ■ y = y H Mx, where M is a positive-definite 



14 For a simple example, consider a basis E' that is simply E scaled 
up by a factor of 2: e\ = 2e,. In the E' frame, a vector's coordinates will 
be half of those in E. 
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Hermitian N x N matrix. In order for this product to remain in- 
variant under coordinate transform, the M matrix must transform 
as M' = A H MA (where A is the transformation matrix defined in 
the previous section), i.e. doubly-covariantly. Looking ahead to 
Sect. I A. 4l this is an example of a (0,2)-type tensor. Another way 
to look at it is that the inner product defines a metric on the vec- 
tor space, and M gives the covariant metric tensor, commonly 
designated as M = gtj. In Einstein notation, the inner product of 

x' and yi is then gjjx'yj, where yi is the complex conjugate of yK 
Given a coordinate system, any choice of Hermitian positive- 
definite M induces an inner product and a metric. In particular, 
choosing M to be unity results in a natural metric of x y - y H x, 
which is in fact the one implicitly used in all RIME literature 
to date. Note, however, that if we change coordinate systems, 
the metric only remains natural if A is a unitary transforma- 
tion {A H A = 1). For example, a rigid rotation of the coordinate 
frame is unitary and thus preserves the metric, while a skew of 
the coordinates changes the metric. The other coordinate trans- 
form commonly encountered in the RIME, that between linear 
xy and circularly-polarized rl coordinates, is also unitary. 

In tensor notation, the Kronecker delta 5,j {djj = 1 for i = j, 
and fo i + j), is often used to indicate a unity M, i.e. as M = 

gij = S U- 

A.3.2. Index lowering and conjugate covectors 

An inner product induces a natural mapping (isomorphism) be- 
tween V and V*, i.e. a pairing up of vectors and covectors. For 
any vector j>, its conjugate covector y can be defined as the linear 
functional y(w) — w ■ y. On space, this function is given by 
y(w) = y H Mw, meaning simply that 

y = y H M, or y t = g u yJ, 

in matrix or Einstein notation, respectively. This operation 
is also known as index lowering. In a coordinate frame with the 
natural metric 5y, index lowering is just the Hermitian transpose: 

y = y H , oryi = y'. 

For conciseness, this paper uses the notation [y 1 ]* or (i.e. 
bar over symbol only) to denote the conjugate covector (or its 
z'-th component) of the vector y. Note how this is distinct from 
the complex conjugate of the /-th component, which is denoted 

by a bar over both the symbol and all indices, e.g. y'. 
A.3.3. Outer product 

Given two vector spaces V and W and the dual space W*, the 
outer product of the vector x e V and the covector y* e W, 
denoted as B = x ®y* , produces a linear transform between W 
and V (which, in other words, is a matrix), that is defined as: 

B(w>) = xy*(w), or [B(w>)] ! = x'yjw j 

i.e. the function given by y* is applied to w (producing a scalar), 
which is then multiplied by the vector x. 

Intrinsically, the outer product is defined on a vector and a 
covector. If we also have an inner product on W, we can use it 
to define the outer product operation on two vectors, as x ® y — 
x®y, where y is the conjugate covector of y (see above). 

Given a coordinate system in a complex vector space, the y 
covector corresponds to the linear function y(w) = y H Mw, and 
the outer product B = x ® y is then 

B(w) = xy H Mw, or [R(w)Y = gjkX^wK 



In other words, the outer product of x and y is given by the matrix 
product xy H M; in Einstein notation the corresponding matrix 

components are b l - = gjkx'y k . 

Consider now a change of coordinates given by the transfor- 
mation matrix A. In the new coordinate system, the outer product 
becomes 

A^'x/MA, or b'} n = a l i al n gjkX i y k . 

i.e. is transformed both contra- and covariantly. This is an 
example of a (l,l)-type tensor. 

A.4. Tensors 

A tensor is a natural generalization of the vector and matrix con- 
cepts. In the coordinate approach, an (n, m)-type tensor over the 
vector space V = F N is given by an (n + m)-dimensional array of 
scalars (from F). A tensor is written using n upper and m lower 
indices, e.g.: T'. 1 ' 2 ' . The rank of the tensor is n + m. A vector 

' 6 7172 --j™ 

(column vector) x' is a (l,0)-type tensor, a covector (row vector) 
Vj is a (0,l)-type tensor, and a matrix is typically a (l,l)-type 
tensor. Note that the range of each tensor index is, implicitly, 
from 1 to N, where yV is the rank of the original vector space. 

The upper indices correspond to contravariant components, 
and the lower indices to covariant components. Under a change 
of coordinates given by A = a!. (A -1 = a'p, the components of 
the tensor transform n times contravariantly and m times covari- 
antly: 



In the case of a Jones matrix (a (l,l)-type tensor), this rule 
corresponds to the familiaiPl matrix transformation rule of J' = 
A 1 JA. Note that on the other hand, the metric M used to specify 
the inner product (Sect. IA.3.U transforms differently, being a 
(0,2)-type tensor. 

As far as typographical conventions go, this paper uses sans- 
serif capitals (P.) to indicate tensors in general, and lower-case 

italics (x',yj) for vectors and covectors. 

In the intrinsic (abstract) definition, a tensor is simply a lin- 
ear function mapping m vectors and n covectors onto a scalar. 
This can be written as 

T : V x ■ • • x V x V* x ■ ■ • x V* h> F, 

m times n times 

or as l(x\, ...,x„,,y\, ...,y* n ) = c. All the coordinate transform 
properties then follow from this one basic definition. 

As a useful exercise, consider that a (l,l)-type tensor, which 
by this definition is a linear function T : V x ?' h F, can be 
easily recast into as a linear transform of vectors. For any vec- 
tor v, consider the corresponding function v'(w*), operating on 
covectors, defined as v'(w*) - T(v, w*). This is a linear function 
mapping covectors to scalars, which as we know (see Sect lA.2l > 
is equivalent to a vector. We have therefore specified a linear 
transform between v and v'. On the other hand, we know that the 
latter can also be specified as a matrix - and therefore a matrix 
is a (l,l)-type tensor. 

This line of reasoning becomes almost tautological if one 
writes out the coordinate components in Einstein notation. The 

15 Note that in Paper I dSmirnovll201 laL Sect. 6.3) this shows up as 
J T = TJ5T -1 , since the T matrix defined therein is exactly the inverse 
of A here. 
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result of T applied to the vector v' and the covector w, is a scalar 
given by the sum 

T(v i ,Wj) = T)v J w i = (T i /)w i . 

Dropping w, and evaluating just the sum P V for each i results 
in N scalars, which are precisely the components of v": 

which on the other hand is just multiplication of a matrix by a 
column vector, written in Einstein notation. 

A.4.1. Transposition and tensor conjugation 

As a purely formal operation, transposition (in tensor notation) 
can be defined as a swapping of upper and lower indices: 

[v''] r = v,, [A}] r = A/, 

and the Hermitian transpose can be defined by combining this 
with complex conjugation. However, in the presence of a non- 
natural metric (Sect. IA.3.1K this is a purely mechanical oper- 
ation with no underlying mathematical meaning, since it turns 
e.g. a vector into an entirely unrelated covector. 

A far more meaningful operation is given by the index low- 
ering procedure of Sect. IA.3.2I used to obtain conjugate covec- 

tors: v; = gyvi, and by its counterpart, index raising: Wi = g'^w], 
where g'i is the contravariant metric tensor (essentially the in- 
verse of gij). This kind of tensor conjugation can be generalized 
to the matrix case as: 

In the case of a natural metric g^ = <5y (and only in this case), 
tensor conjugation is the same as a mechanical Hermitian trans- 
pose. 

For conciseness, this paper uses the notation 1, to denote ten- 
sor conjugation, i.e. x, is the conjugate covector of the vector x'. 

A.5. Tensor operations and the Einstein notation 

Einstein notation allows for some wonderfully compact repre- 
sentations of linear operations on tensors, which result in other 
tensors. Some of these were already illustrated above: 
Inner product: gijx'yj = c, resulting in a scalar. 

Index lowering: y,- = gijyi, converting a (l,0)-type tensor (a 
vector) into a (0,l)-type tensor (its conjugate covector). 

Outer product of a vector and a covector, b\ = x'yj, result- 
ing in a (l,l)-type tensor (a matrix). 

Matrix multiplication of a matrix by a vector, resulting in 
another vector: v" = T^v- 7 . 

Consider now multiplication of two matrices, which in 
Einstein notation can be written as 



Here, k is a summation index (since it is repeated), while i and 
j are free indices. Free indices propagate to the left-hand side 
of the expression. This is a very easy formal rule for keeping 
track of what the type of the result of a tensor operation is. For 
example, the result of A^B*Cy is a (0, l)-type tensor, d\ (i.e. just a 
humble covector), since / is the only free index in the expression. 



In complicated expressions, a useful convention is to use 
Greek letters for the summation indices, and Latin ones for the 
free indices: A^B^. This makes the expressions easier to read, 
but is not always easy to follow consistently. This paper tries to 
follow this convention as much as possible when recasting the 
RIME in tensor form. 

The true power of Einstein notation is that it establishes rel- 
atively simple formal rules for manipulating tensor expressions. 
These rules can help reduce complex expressions to manageable 
forms. 

A.5.1. No duplicate indices in the same position 

Consider the expression x'y'. The index i is nominally free, so 
can we treat the result as a (l,0)-type tensor £ = x'y'l The an- 
swer is no, because z' does not transform as a (l,0)-type tensor. 
Under change of coordinates, we have 

z " = x 'y = (a'jx^iaix*) = ~a l ~a\x j x k , 

which is not a single contravariant transform. In fact, z' trans- 
forms as a (2,0)-type tensor. 

Summation indices cannot appear multiple times either: the 
expression W^. = X' a Y a Z a j is not a valid (l,l)-type tensor! 

In general, any expression in Einstein notation will not yield 
a valid tensor if it contains repeated indices in the same (upper 
or lower) position. However, in this paper I make use of mixed- 
dimension tensors (Sect. IA.6.21 I. with a restricted set of coordi- 
nate transforms, which results in some indices being effectively 
invariant. Invariant indices can be repeated. 

A.5.2. Commutation 

The terms in an Einstein sum commute! For any particular set 
of index values, each term in the sum represents a scalar, and the 
scalars as a whole make up one product in some complicated 
nested sum - and scalars commute. For example, the matrix 
product AB = A[ ? B" can be rewritten as B"A' a without changing 
the result. Were we to swap the matrices themselves around, the 
Einstein sum would become B[ ? A" + B"A^. Note how the rela- 
tive position (upper vs. lower) of the summation index changes. 
In one case we're summing over columns of B and rows of A, in 
the other case over columns of A and rows of B. 

A.5.3. Conjugation 

The tensor conjugate of a product is a product of the conjugates, 
with upper and lower indices swapped: 

/ = N a x\ y { = [A^«]* = Afx a . 
C) = A^BJ, Cf = [A^f = A"B J a . 

This follows from the definition of conjugation in 
Sect. IA.4Tl and the commutation considerations above. 

A. 5.4. Isolating sub-products and collapsing indices 

Summation indices can be "collapsed" by isolating intermediate 
products that contains all occurrences of that index. For example, 
in the sum A^B^C^, the index (3 can be collapsed by defining 

the intermediate product E" = B^C^. The sum then becomes 
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simply a matrix product, A[,EJ. It is important that the isolated 
sub-product contain all occurrences of the index. For example, 
it would be formally incorrect to isolate the sub-product F" = 

B^C^, since the sum then become A' a F"D^, The "loose" f3 index 
on D then changes the type of the result. 

A.6. Mapping the RIME onto tensors 

This section contains some in-depth mathematical details (that 
have been glossed over in the main paper) pertaining to how 
the concepts of the RIME map onto formal definitions of tensor 
theory. 

A.6.1 . Coherency as an outer product 

The outer product operation is crucial to the RIME, since it 
is used to characterize the coherency of two EMF or volt- 
age vectors. In tensor terms, the outer product can be defined 
(Sect. IA.3.3b in a completely abstract and coordinate-free way. 
By contrast, the definition usually employed in physics litera- 
ture, and the RIME literature in particular, consists of a for- 
mal, mechanical manipulation of vectors (in the list-of-numbers 
sense) . In particular, to der ive the 4x4 form alism jHamaker et al.l 
(19961) (see also Paper I, ISmirnovll2011al Sect. 6.1) used the 
Kronecker product form: 



x 2 



( *iyi ^ 

xiy\ 



while the J ones formalism is emerged (lHamakerl l2000t 
ISmirnovll201 lal Sect. 1) by using the matrix product xy 11 in- 
stead. Note that while the former operation produces 4-vectors 
and the latter 2x2 matrices, the two are isomorphic. It is impor- 
tant to establish whether an outer product defined in this way is 
fully equivalent to that defined in tensor theory. 

In fact, by defining the outer product as xy H in a specific 
coordinate system, we're implicitly postulating a natural metric 
(M = 6jj) in that coordinate system. This is of no consequence 
if only a single coordinate system is used, or if we restrict our- 
selves to unitary coordinate transformations, as is the case for 
transformations between xy linearly polarized and rl circularly 
polarized coordinates (such coordinate frames are called mutu- 
ally unitary). It is something to be kept in mind, however, if for- 
mulating the RIME in a coordinate-free way. 

An outer product given by V = xy H in a specific coordi- 
nate system transforms as A -1 VA under change of coordinates, 
just like Jones matrices do. Alternatively, we may mechanically 
define an outer product-like operation as W = xy H in all coor- 
dinate systems, and this would then transform as A _1 V[A _1 ] H . 
The two definitions are only equivalent under unitary coordi- 
nate transformations! This is an easily overlooked po int, most 
recent ly missed by the author of this paper: in Paper I (Smirnov 
1201 lal Sect. 6.3), only the second transform is given for co- 
herency matrices, with no mention of the first. It may be some- 
what academic in practice, since all applications of the RIME to 
date have restricted themselves to the mutually unitary xy and 
rl coordinate frames, but it may be relevant for future develop- 
ments. 



A.6.2. Mixed-dimensionality tensors 

Under the strict definition, a tensor is associated with a single 
vector field F N , and so must be represented by an N xN x ...xN 
array of scalars. In other words, all its indices must have the 
same range from 1 to N. 

Applied literature (and the present paper in particular) often 
makes use of mixed-dimensionality tensors (MDTs), i.e. arrays 
of numbers with different dimensions. In particular, Sect. I3.2l in- 



troduces the visibility MDT V^L with nominal dimensions of 
NxNxlxl. Such entities are not proper tensors in the strict def- 
inition, so we should formally establish to what extent they can 
be treated as such. The point is not entirely academic. Einstein 
summation (or any of the other operations discussed above) can 
be mechanically applied to arbitrary arrays of numbers, but the 
results are not guaranteed to be self-consistent under change of 
coordinates unless it can be formally established that they be- 
have like tensors. Conversely, if we can formally establish that 
some operation yields a tensor of type (n, m), then we know ex- 
actly how to transform it. 

At first glance, MDTs seem to be significantly different from 
proper tensors. The difficulty lies in the fact that they seem to 
have two categories of indices. For example, has "2-indices" 

(or "3-indices") i,j, associated with the vector space C 2 or C 3 , 
with respect to which the MDT behaves like a proper tensor, 
and 'W-indices" like p and q, which only serve to "bundle" 
lower-ranked tensors together. V p \ really behaves like a bundle 
of matrices (rank-2 tensors) rather than a proper rank-4 tensor. 
In particular, the coordinate transforms we normally consider in- 
volve the 2-indices only, and not the AMndices, so V£. really 
transforms like a (l-l)-type tensor. Fortunately, it turns out that 
MDTs can be mapped to proper tensors in a mathematically rig- 
orous way. 

Let's assume we have a vector space like C 2 (which we'll 
call the core space), with a core metric of gij, and MDTs with 
a combination of 2-indices and AMndices. For illustration, con- 
sider the simpler case of the matrix W^, which (under the above 
terminology) is really a bundle of N 2-vectors: 



w =ii 1 •■• w \ 

W l W 2 ■■■ W N 



Let's formally map MDTs to conventional tensors over C 2+N 
space as follows: 



\/ pi . = \ V t^)j ^P>2,q>2,i<2,j<2 
q * I 0, otherwise, 



(A.2) 



This can be generalized to any mix of 2- and AMndices. We'll 
use the term 2-restricted tensor for any tensor over C 2+N whose 
components are null if any AMndex is equal to 2 or less, or any 
2-index is above 2. Note that this mapping from MDTs to 2- 
restricted tensors is isomorphic: every 2-restricted tensor over 
C 2+N has a unique MDT counterpart. 

For the matrix W p , the mapping procedure effectively pads 
it out with nulls to make a (N + 2) x (N + 2) matrix: 



W 



w\ w 2 








(A3) 
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In a sense, this procedure partitions the dimensions of the 
C 2+N space into the core dimensions (the first two), and the 
"bundling" dimensions (the other N). Formally, all the indices 
of V and W range from 1 to N + 2, but 2-restricted tensors are 
constructed in such a way that components whose indices are 
"out of range" are all null. 

Now let's also map the coordinate transforms of the core 
space C 2 onto a subset of the coordinate transforms of C 2+N , 
using a transformation matrix of the form 



A = 







a] a\ 

iij a] 

10 

1 



I 



0^ 



'•• 
I) 



(i.e. a) = 6) if i or ; > 2), (A.4) 



where 6 l . is the Kronecker delta. The W matrix transforms as 

A~' WA; it is easy to verify that it retains the same padded layout 
as Eq. (I A. 31 > under such restricted coordinate transforms, and that 
the upper-right block of the padded matrix actually transforms 
to Ap'W, where Ap) is the upper-left 2x2 corner of A (giving 

the original transform of C 2 ). In other words, every vector Wj of 
the bundle transforms as AtIwj, exactly as vectors over the core 
space do! 

This property generalizes to higher-rank tensors. For exam- 
ple, the 2-restricted tensor \r ! . should formally transform as (see 



Eq. CO}): 



V*,P* ~p~i r J3\ia 
q j = a l a a a a q cF\l T 



However, for p < 2 we have Vf^ = by definition, while for 

p > 2, a^- = 1 for p - cr, and is null otherwise. Therefore, only 
the p = cr (and, similarly, q — t) terms contribute to the sum 
above. Thus, 



that tensor conjugation is expressed in terms of the core metric 
only: 



V' = 



' pfi- 



V 



,pi 

is 



To summarize, we have formally established that MDTs with 
a core vector space of C M and a seconoJ3 dimensionality of N 
can be isomorphically mapped onto the set of M-restricted ten- 
sors over C M+N . Under coordinate transforms of the core vector 
space, such tensors behave co- and contravariantly with respect 
to the M-indices, and invariantly with respect to the AMndices. 
We are therefore entitled to treat MDTs as proper tensors for the 
purposes of this paper. 
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so our nominally (2,2)-type tensor \l p \ behaves exactly like 
a (l,l)-type tensor under any coordinate transform given by 
Eq. ( IA.4I ). The p and q indices can be called invariant (as 
opposed to co- or contravariant). In general, any 2-restricted 
(«,m)-type tensor having (n',m r ) 2-indices always behaves as 
an («', m')-type tensor under such coordinate transforms. 

For the same reason, we can relax the rules of Einstein 
summation to allow repeated invariant indices. For example, 
z' = x'y' is not a valid tensor (one index, but transforms doubly- 
contravariantly!), but Z' p = XpY^ is a perfectly valid tensor, 
since an extra invariant index p does not change how the com- 
ponents transform. 

Furthermore, it is easy to see that a 2-restricted tensor re- 
mains 2-restricted under any coordinate transform of the core 
vector space, so the property of being 2-restricted is, in a sense, 
intrinsic. Any product of 2-restricted tensors is also 2-restricted. 
In other words, 2-restricted tensors form a closed subset un- 
der coordinate transforms of the core vector space, and under 
all product operations. To complete the picture, we can define 

a metric in C 2+N by using the core metric for the core dimen- 

sions, and the natural metric for the "bundling" dimensions, so 16 Additional dimensionalities may be "mixed in" in the same way. 



